A starting point: Introductory comments

Description

A brief introduction to the R/qtl package, with a walk-through of an analysis.

Arguments

New to R and/or R/qtl?

In order to use the R/qtl package, you must type (within R)library(qtl). You may wish to include this in a.Firstfunction or.Rprofilefile.
Documention and several tutorials are available at the R archive (http://cran.r-project.org).
Use thehelp.startfunction to start the html version of the R help. In Windows, you may wish to useoptions(htmlhelp=TRUE)to get easy access to the html version of the help files; this could be included in a.Firstfunction or.Rprofilefile.
Typelibrary(help=qtl)to get a list of the functions in R/qtl.
Use theexamplefunction to run examples of the various functions in R/qtl.
A tutorial on the use of R/qtl is distributed with the package,rqtltour.pdf.
Download the latest version of R/qtl fromhttp://biosun01.biostat.jhsph.edu/~kbroman/qtl.

Walk-through of an analysis

Here we briefly describe the use of R/qtl to analyze an experimental cross. A more extensive tutorial on its use is distributed with the package, rqtltour.pdf.

A difficult first step in the use of most data analysis software is the import of data. With R/qtl, one may import data in several different formats by use of the function read.cross. The internal data structure used by R/qtl is rather complicated, and is described in the help file for read.cross. We won't discuss data import any further here, except to say that the comma-delimited format ("csv") is recommended. If you have trouble importing data, send an email to Karl Broman, kbroman@jhsph.edu, perhaps attaching examples of your data files. (Such data will be kept confidential.)

We consider the example data hyper, an experiment on hypertension in the mouse, kindly provided by Bev Paigen and Gary Churchill. Use the data function to load the data.

data(hyper)

The hyper data set has class cross. The function summary.cross gives some summary information on the data, and checks the data for internal consistency. A number of other utility functions are available; hopefully these are self-explanatory.

summary(hyper) nind(hyper) nphe(hyper) nchr(hyper) nmar(hyper) totmar(hyper)

The function plot.cross gives a graphical summary of the data; it calls plot.missing (to plot a matrix displaying missing genotypes) and plot.map (to plot the genetic maps), and also displays histograms or barplots of the phenotypes. The plot.missing function can plot individuals ordered by their phenotypes; you can see that for most markers, only individuals with extreme phenotypes were genotyped.

plot(hyper) plot.missing(hyper) plot.missing(hyper, reorder=TRUE) plot.map(hyper)

Note that one marker (on chromosome 14) has no genotype data. The function drop.nullmarkers removes such markers from the data.

hyper <- drop.nullmarkers(hyper) totmar(hyper)

The function est.rf estimates the recombination fraction between each pair of markers, and calculates a LOD score for the test of $r$ = 1/2. This is useful for identifying markers that are placed on the wrong chromosome. Note that since, for these data, many markers were typed only on recombinant individuals, the pairwise recombination fractions show rather odd patterns.

hyper <- est.rf(hyper) plot.rf(hyper) plot.rf(hyper, chr=c(1,4))

To re-estimate the genetic map for an experimental cross, use the function est.map. The function plot.map, in addition to plotting a single map, can plot the comparison of two genetic maps (as long as they are composed of the same numbers of chromosomes and markers per chromosome). The function replace.map map be used to replace the genetic map in a cross with a new one.

newmap <- est.map(hyper, error.prob=0.01, trace=TRUE) plot.map(hyper, newmap) hyper <- replace.map(hyper, newmap)

Before doing QTL analyses, a number of intermediate calculations may need to be performed. The function calc.genoprob calculates conditional genotype probabilities given the multipoint marker data. sim.geno simulates sequences of genotypes from their joint distribution, given the observed marker data. argmax.geno calculates the most likely sequence of underlying genotypes, given the observed marker data.

These three functions return a modified version of the input cross, with the intermediate calculations included.

hyper <- calc.genoprob(hyper, step=2.5, error.prob=0.01) hyper <- sim.geno(hyper, step=2.5, n.draws=64, error.prob=0.01) hyper <- argmax.geno(hyper, error.prob=0.01)

The function calc.errorlod may be used to assist in identifying possible genotyping errors; it calculates the error LOD scores described by Lincoln and Lander (1992). It requires the results of calc.genoprob, run with a matching value for the assumed genotyping error probability, error.prob.

hyper <- calc.errorlod(hyper, error.prob=0.01) plot.errorlod(hyper) top.errorlod(hyper) plot.errorlod(hyper, chr=c(4,11,16))

The function plot.geno may be used to inspect the observed genotypes for a chromosome, with likely genotyping errors flagged.

plot.geno(hyper, chr=16, ind=71:90, min.sep=4)

The function scanone performs a genome scan with a single QTL model. By default, it performs standard interval mapping (Lander and Botstein 1989): use of a normal model and the EM algorithm. If one specifies method="hk", Haley-Knott regression is performed (Haley and Knott 1992). These two methods require the results from calc.genoprob.

out.em <- scanone(hyper) out.hk <- scanone(hyper, method="hk")

If one specifies method="imp", a genome scan is performed by the multiple imputation method of Sen and Churchill (2001). This method requires the results from sim.geno.

out.imp <- scanone(hyper, method="imp")

The output of scanone is a data.frame with class scanone. The function plot.scanone may be used to plot the results, and may plot up to three sets of results against each other, as long as they conform appropriately.

plot(out.em) plot(out.hk, col="blue", add=TRUE) plot(out.imp, col="red", add=TRUE) plot(out.hk, out.imp, out.em, chr=c(1,4), lty=1, col=c("blue","red","black"))

The function summary.scanone may be used to list information on the peak LOD for each chromosome for which the LOD exceeds a specified threshold.

summary(out.em) summary(out.em, 3) summary(out.hk, 3) summary(out.imp, 3)

One may also use scanone to perform a permutation test to get a genome-wide LOD significance threshold. This will take some time, so maybe now is a good time to go for a cup of coffee.

operm.hk <- scanone(hyper, method="hk", n.perm=100) quantile(operm.hk, 0.95)

We should say at this point that the function save.image will save your workspace to disk. You'll wish you had used this if R crashes.

save.image()

The function scantwo performs a two-dimensional genome scan with a two-QTL model. Methods "em", "hk" and "imp" are all available. scantwo is considerably slower than link[qtl]{scanone}, and can require a great deal of memory. Thus, you may wish to create a version of hyper for a more coarse grid.

hyper.coarse <- calc.genoprob(hyper, step=10, err=0.01) hyper.coarse <- sim.geno(hyper, step=10, n.draws=64, err=0.01) out2.hk <- scantwo(hyper.coarse, method="hk") out2.em <- scantwo(hyper.coarse) out2.imp <- scantwo(hyper.coarse, method="imp")

The output is an object with class scantwo. The function plot.scantwo may be used to plot the results. The upper triangle contains LOD scores for tests of epistasis, while the lower triangle contains joint LOD scores.

plot(out2.hk) plot(out2.em) plot(out2.imp)

The function summary.scantwo lists the interesting aspects of the output. One provides three LOD thresholds: for the joint LOD, epistasis LOD, and conditional, single-QTL LOD scores. The locus pairs giving the highest joint LOD for each pair of chromosomes are inspected, and those whose LOD score exceed the joint LOD threshold and for which either the interaction LOD exceeds its threshold or both the conditional single-QTL LODs exceed their threshold, are printed.

summary(out2.em, c(8, 3, 3)) summary(out2.em, c(0, 1000, 4)) summary(out2.em, c(0, 4, 1000))

Permutation tests may also performed with scantwo; it may take a few days of CPU time. The output is a matrix with two columns: the maximum joint and epistasis LODs, across the two-dimensional genome scan, for each simulation replicate.

operm2 <- scantwo(hyper.coarse, method="hk", n.perm=100) apply(operm2, 2, quantile, 0.95) hist(operm.hk,breaks=20)

References

Haley, C. S. and Knott, S. A. (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69, 315--324.

Lander, E. S. and Botstein, D. (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185--199.

Sen, S. and Churchill, G. A. (2001) A statistical framework for quantitative trait mapping. Genetics 159, 371--387.